Skip to content

DAOS-16964 ddb: read-only mode requires enabling Copy-on-Write for...#17533

Open
janekmi wants to merge 7 commits intomasterfrom
janekmi/DAOS-16964-read-only-cow
Open

DAOS-16964 ddb: read-only mode requires enabling Copy-on-Write for...#17533
janekmi wants to merge 7 commits intomasterfrom
janekmi/DAOS-16964-read-only-cow

Conversation

@janekmi
Copy link
Contributor

@janekmi janekmi commented Feb 9, 2026

... PMEMOBJ pools

PMEMOBJ maintains its own metadata. Copy-on-Write prevents these changes from taking effect so read-only mode will be truly read-only.

Also removing the mlock() workaround because:

  • PMEM + Copy-on-Write + mlock() leads to increased memory usage, since the entire pool is pulled into RAM when it is opened. Where the mlock() serves no role whatsoever.
  • mlock() has been unnecessary for quite some time. It was originally added to work around a cryptic issue observed when using libfabric with the verbs provider and performing direct RDMA writes into pool memory. Direct RDMA writes to pool memory are no longer used, so the workaround is obsolete. For details please see the ticket to get the complete paper trail.

Steps for the author:

  • Commit message follows the guidelines.
  • Appropriate Features or Test-tag pragmas were used.
  • Appropriate Functional Test Stages were run.
  • At least two positive code reviews including at least one code owner from each category referenced in the PR.
  • Testing is complete. If necessary, forced-landing label added and a reason added in a comment.

After all prior steps are complete:

  • Gatekeeper requested (daos-gatekeeper added as a reviewer).

…EMOBJ pools

PMEMOBJ maintains its own metadata. Copy-on-Write prevents these changes from
taking effect so read-only mode will be truly read-only.

Signed-off-by: Jan Michalski <jan-marian.michalski@hpe.com>
@github-actions
Copy link

github-actions bot commented Feb 9, 2026

Ticket title is 'Opening a VOS file in read only mode with DDB modifies the VOS file'
Status is 'In Review'
Labels: 'scrubbed_2.8'
Job should run at elevated priority (1)
https://daosio.atlassian.net/browse/DAOS-16964

Nasf-Fan
Nasf-Fan previously approved these changes Feb 10, 2026
@knard38
Copy link
Contributor

knard38 commented Feb 10, 2026

LGTM.

This patch was manually tested and the vos file are indeed not updated any more when opened in read only.
image

Without this patch, the md5sum is not the same with the same use case.
image

knard38
knard38 previously approved these changes Feb 10, 2026
Copy link
Contributor

@knard38 knard38 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Doc-only: true

Signed-off-by: Jan Michalski <jan-marian.michalski@hpe.com>
@janekmi janekmi dismissed stale reviews from knard38 and Nasf-Fan via 7baa0b6 February 12, 2026 15:46
@janekmi janekmi requested a review from grom72 February 12, 2026 15:47
…read-only-cow

Signed-off-by: Jan Michalski <jan-marian.michalski@hpe.com>
Doc-only: true

Signed-off-by: Jan Michalski <jan-marian.michalski@hpe.com>
grom72
grom72 previously approved these changes Feb 23, 2026
@janekmi janekmi requested review from Nasf-Fan and knard38 February 23, 2026 12:11
knard38
knard38 previously approved these changes Feb 23, 2026
@janekmi janekmi requested a review from a team February 23, 2026 12:27
* However, since none of these changes need to be persisted when the pool is opened in
* read‑only mode (write_mode == false), we can work around this by mapping the pool using
* copy‑on‑write. Copy‑on‑write allows pages to be read normally, but when a page is
* modified, a new private copy is allocated. As a result, any changes made to
Copy link
Contributor

@Nasf-Fan Nasf-Fan Feb 24, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If new private copy is allocated, then when open the vos pool again, the operation (read/write) will work against the original one or the new copy?

On the other hand, under non-interactive mode, many ddb sub-commands will open the vos pool implicitly as read-only, such as ddb ls. If it is repeated for a lot of times (for different items), will that cause some trouble for copy-on-write? Under extreme case, will ddb ls may fail because of ENOSPC?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If new private copy is allocated, then when open the vos pool again, the operation (read/write) will work against the original one or the new copy?

This is a volatile copy of modified pages in RAM. These pages are freed once you close the pool / call munmap().

Please see mmap() and MAP_PRIVATE for details.

On the other hand, under non-interactive mode, many ddb sub-commands will open the vos pool implicitly as read-only, such as ddb ls. If it is repeated for a lot of times (for different items), will that cause some trouble for copy-on-write? Under extreme case, will ddb ls may fail because of ENOSPC?

If it is truly read-only I assume no changes will be made no matter the number of items being processed. So, I expect after startup, when a few copied pages will be allocated (for various PMEMOBJ purposes), the memory consumption in this department ought to be flat.

If, despite of the assumption of not making any changes to the pool, VOS actually makes writes to the pool when the user (DDB) requests merely to read the data, the memory consumption will continue to grow. The final signal would be ENOMEM.

If the latter is the case I would argue we should rethink are even able to provide a read-only mode.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not sure for this. It is better to make some test with you patch, such as repeatedly ddb ls.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the latter is the case I would argue we should rethink are even able to provide a read-only mode.

I do not know how the implicit modification will be triggered when open the vos-pool and what is the potential side-effect. According to your description in DAOS-16964, both the vos file time stamp and some other metadata are changed.

Will that potentially damage the vos-pool? If it is harmful, does it means even if we did not run ddb, related risk still be there when DAOS engine normally opens the vos-pool? Otherwise, if it is harmless, can we ignore such implicit modification to avoid CoW trouble?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will that potentially damage the vos-pool?

Yes. It can. You can imagine e.g. the PMEMOBJ ulog is corrupted. If we have bad luck PMEMOBJ may apply these malicious changes and further mangle the VOS pool contents. Which may prevents us from finding the root cause of whatever problem we are trying to pinpoint.

does it means even if we did not run ddb, related risk still be there when DAOS engine normally opens the vos-pool?

Yes. At the moment opening the VOS pool always does the same things underneath. No matter whether the open is supervised by DDB or daos_engine.

can we ignore such implicit modification to avoid CoW trouble?

It looks like it is not harmless in general. But regardless I think CoW is worth the hassle because we need a read-only mode in my opinion. Otherwise, DDB is not really a debug tool since we cannot control what it does to the VOS file.


  1. I added a test which shows that CoW prevents changes from reaching the backing-file and in result also mtime stays the same. 😁
  2. I have done some measurements to check the memory consumption in both cases. TBH the results are odd. I wrote my observations in the ticket: https://daosio.atlassian.net/browse/DAOS-16964?focusedCommentId=153817 Regardless I think we should proceed with this patch and maybe create a new ticket to understand better why it is the case. Which would be beneficial in three ways:
  • we could potentially decrease the memory usage assuming some of these operations are not necessary for DDB to function.
  • if these operations are not optimal maybe there is a room for performance gains for both DDB and daos_engine
  • it is not good both of us are surprised by this result (I assume you did not know it is the case, did you? 😉 ) so whatever we find out it would improve our understanding of how VOS really works

@Nasf-Fan Nasf-Fan self-requested a review February 24, 2026 03:12
@daltonbohning
Copy link
Contributor

Master PRs currently require merge approval. I added request_for_2.8 to the ticket

@mchaarawi
Copy link
Contributor

@janekmi please address the question on the PR from @Nasf-Fan
@Nasf-Fan please approve once you finish your review.

@mchaarawi mchaarawi removed the request for review from a team February 24, 2026 15:51
Signed-off-by: Jan Michalski <jan-marian.michalski@hpe.com>
@janekmi janekmi dismissed stale reviews from knard38 and grom72 via 5e28aaf March 2, 2026 20:37
@github-actions github-actions bot added the priority Ticket has high priority (automatically managed) label Mar 2, 2026
@janekmi janekmi requested review from grom72 and knard38 March 2, 2026 20:38
knard38
knard38 previously approved these changes Mar 4, 2026
Copy link
Contributor

@knard38 knard38 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Ref: #7153

Signed-off-by: Jan Michalski <jan-marian.michalski@hpe.com>
@janekmi janekmi requested review from a team as code owners March 5, 2026 10:32
@janekmi janekmi requested a review from knard38 March 5, 2026 11:12
knard38
knard38 previously approved these changes Mar 5, 2026
Nasf-Fan
Nasf-Fan previously approved these changes Mar 5, 2026
Copy link
Contributor

@Nasf-Fan Nasf-Fan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the investigation @janekmi

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shall we also remove this munlock() function call?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

Signed-off-by: Jan Michalski <jan-marian.michalski@hpe.com>
@janekmi janekmi dismissed stale reviews from Nasf-Fan and knard38 via 1de73f2 March 6, 2026 12:00
@janekmi janekmi requested review from Nasf-Fan, grom72 and knard38 March 6, 2026 12:02
Copy link
Contributor

@grom72 grom72 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

priority Ticket has high priority (automatically managed)

Development

Successfully merging this pull request may close these issues.

6 participants